Closure Operators and Spam Resistance for PageRank

نویسندگان

  • Lucas Farach-Colton
  • Martin Farach-Colton
  • Reut Levi
  • Moti Medina
  • Miguel Mosterio
چکیده

We study the spammablility of ranking functions on the web. Although graph-theoretic ranking functions, such as Hubs and Authorities and PageRank exist, there is no graph theoretic notion of how spammable such functions are. We introduce a very general cost model that only depends on the observation that changing the links of a page that you own is free, whereas changing the links on pages owned by others requires effort or money. We define spammability to be the ratio between the amount of benefit one receives for one's spamming efforts and the amount of effort/money one must spend to spam. The more effort/money it takes to get highly ranked, the less spammable the function. Our model helps explain why both hubs and authorities and standard PageRank are very easy to spam. Although standard PageRank is easy to spam, we show that there exist spam-resistant PageRanks. Specifically, we propose a ranking method, Min-k-PPR, that is the component-wise min of a set of personalized PageRanks centered on k trusted sites. Our main results are that Min-k-PPR is, itself, a type of PageRank and that it is expensive to spam. We elucidate a surprisingly elegant algebra for PageRank. We define the space of all possible PageRanks and show that this space is closed under some operations. Most notably, we show that PageRanks are closed under (normalized) component-wise min, which establishes that (normalized) Min-k-PPRis a PageRank. This algebraic structure is also key to demonstrating the spam resistance of Min-k-PPR.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Spam Farm to Boost PageRank

Today people have become more and more dependent on search engines such as Google, Yahoo, and MSN, etc., for their information needs. Web spamming has emerged to take the economic advantage of high search rankings and threatened the accuracy and fairness of those rankings. Understanding spamming techniques is essential for evaluating the strength and weakness of a ranking algorithm, and for fig...

متن کامل

SpamRank -- Fully Automatic Link Spam Detection

Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists or other means of human intervention. We assume that spammed pages have a biased distribution of p...

متن کامل

M-FUZZIFYING MATROIDS INDUCED BY M-FUZZIFYING CLOSURE OPERATORS

In this paper, the notion of closure operators of matroids  is generalized to fuzzy setting  which is called $M$-fuzzifying closure operators, and some properties of $M$-fuzzifying closure operators are discussed. The $M$-fuzzifying matroid induced by an $M$-fuzzifying closure operator can induce an $M$-fuzzifying closure operator. Finally, the characterizations of $M$-fuzzifying acyclic matroi...

متن کامل

Link Spam Detection based on DBSpamClust with Fuzzy C-means Clustering

This Search engine became omnipresent means for ingoing to the web. Spamming Search engine is the technique to deceiving the ranking in search engine and it inflates the ranking. Web spammers have taken advantage of the vulnerability of link based ranking algorithms by creating many artificial references or links in order to acquire higher-than-deserved ranking n search engines' results. Link b...

متن کامل

SNUMedinfo at TREC Web track 2014

This paper describes the participation of the SNUMedinfo team at the TREC Web track 2014. This is the first time we participate in the Web track. Rather than applying more sophisticated retrieval method such as learning to rank models, this year we used only baseline retrieval models with spam filtering and pagerank prior.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018